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Abstract 

To make inferences about the shape of a population distribution, the widely popular mean 
regression model, for example, is inadequate if the distribution is not approximately Gaussian 
(or symmetric). Compared to conventional mean regression (MR), quantile regression (QR) 
can characterize the entire conditional distribution of the outcome variable, and is more ro¬ 
bust to outliers and misspecification of the error distribution. We present a likelihood-based 
approach to the estimation of the regression quantiles based on the asymmetric Eaplace dis¬ 
tribution (AED), which has a hierarchical representation that facilitates the implementation 
of the EM algorithm for the maximum-likelihood estimation. We develop a case-deletion di¬ 
agnostic analysis for QR models based on the conditional expectation of the complete-data 
log-likelihood function related to the EM algorithm. The techniques are illustrated with both 
simulated and real data sets, showing that our approach out-performed other common classic 
estimators. The proposed algorithm and methods are implemented in the R package ALDqr (). 


Keywords: Quantile regression model; EM algorithm; Case-deletion model; Asymmetric 
Eaplace distribution. 


1 Introduction 


QR models have beeome inereasingly popular sinee the seminal work of iKoenker & G Bassett (19781) . 
In eontrast to the mean regression model, QR belongs to a robust model family, whieh can give a n 
overall assessment of the covariate effects at different quantiles of the outcome (IKoenker. 20051) . 

In particular, we can model the lower or higher quantiles of the outcome to provide a natural 
assessment of covariate effects specific for those regression quantiles. Unlike conventional mod¬ 
els, which only address the conditional mean or the central effects of the covariates, QR models 
quantify the entire conditional distribution of the outcome variable. In addition, QR does not 
impose any distributional assumption on the error, except requiring the error to have a zero con¬ 
ditional quantile. The foundations of the methods for independent data are now consolidated, and 
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some statistical methods for estimating and drawing inferences about conditional quantiles are 
provided by most of the available statistical programs (e.g., R, SAS, Matlab and Stata). For in- 
stance , just to name a few, in the well-known R package quantregO is implemented a variant 
of the B^rodale & Roberts (1977b simplex (BR) for linear programming problems described in 


Koenker & d’Orev (1987h . where the standard errors are computed by the rank inversion method 


(IKoenker. 2005h . Another method impl emented in this po pular package is Lasso Penalized Quan¬ 
tile Regression (LPQR), introduced by Tibshirani (1996h . where a penalty parameter is specified 
to determine how much shrinkage occur s in the estimation process. QR can be implemented in a 
range of different ways. Koenker (2005b provided an overview of some commonly used quantile 
reg ression techniques from a "classical" framework. 


Kottas & Gelfand (200 ih considered median regression from a Bayesian point of view, which 


is a special case of quantile regression, and discussed non-parametric modeling for the error 
distribution based on either Poly a tree or Dirichlet process priors. Regarding general quantile 
regression. IYu & Moveed (20011) proposed a Bayesian modeling approach by using the ALD, 
Kottas & Kmjajic (2009h developed Bayesian semi-para metric models for quant ile regression us¬ 


ing Dirichlet process mixtures for the error distribution, iGeraci & Bottai (2007h studi ed quantile 
regression for longitudinal data using the ALD. Recently, Kozumi & Kobavashi (201 ll) developed 
a simple and efficient Gibbs sampling algorithm for fitting the quantile regression model based on 
a location-scale mixture representation of the ALD. 

An interesting aspect to be considered in statistical modelling is the diagnostic analysis. This 
can be carried out by conducting an influence analysis for detecting influential observations. One 
of the most technique to detect influential observations is the case-deletion approach. The famous 
approach of Cook (1977) has been applied extensively t o assess the influence of an observation in 
fitting a statistical model; see ICook & Weisberg (19821) and the references therein. It is difficult 
to apply this approach directly to the Q R model because the underlying observed-data likelihood 


function is not differentiable at zero. IZhu e.t al. (200 Ih presents an approach to perform diag¬ 


nostic analysis for general statistical models that is based on the Q-displacement function. This 
approach has be en applied succe ssfully to perform influence analysis in several regression m od- 
els, for example, Xie etal. (2007 ) considered in multivariate t distrib ution, Matos et al. (20131) ob¬ 
tained case-deletion meas ures for mixed-effects models following the lZhu e.t al. tlOOlh ’s approach 
and in Zeller gt a/. (2010h we can see some results about local influence for mixed-effects models 
obtained by using the Q-displacement function. 

Taking advantage of the likelihood structure imposed by the ALD, the hierarchical representa¬ 
tion of the ALD, we develop here an EM-type algorithm for obtaining the ML estimates at the pth 
level, and by simulation studies our EM algorithm outperformed the competing BR and EPQR al¬ 
gorithms, where the standard error is obtained as a by-product. Moreover, we obtain case-deletion 
measures for the QR model. Since QR methods complement and improve established means re¬ 
gression models, we feel that the assessment of robustness aspects of the parameter estimates in 
QR is also an important concern at a given quantile level p G (0,1). 

The rest of the paper is organized as follows. Section 2 introduces the connection between 
QR and AED as well as outlining the main results related to AED. Section 3 presents an EM- 
type algorithm to proceed with ME estimation for the parameters at the pth level. Moreover, the 
observed information matrix is derived. Section [3] provides a brief sketch of the case-deletion 
method for the model with incomplete data, and also develop a methodology pertinent to the AED. 
Sections |4] and |5] are dedicated to the analysis of real and simulated data sets, respectively. Section 
6 concludes with a short discussion of issues raised by our study and some possible directions for 
the future research. 
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2 The quantile regression model 

Even though considerable amount of work has been done on regression models and their exten¬ 
sions, regression models by using asymmetric Laplace distribution have received little attention in 
the literature. Only recently, the a study on quantile regression model based on asymmetric Laplace 
distribution was presented by Tian et al. (2014) who a derived several interesting and attractive 
properties and presented an EM algorithm. Before presenting our derivation, let us recall firstly 
the definition of the asymmetric Laplace distribution and after this, we will present the quantile 
regression model. 


2.1 Asymmetric Laplace distribution 


As discussed in I Yu & Moyeed (200 Ih . we say that a random variable Y is distributed as an ALD 
with location parameter /i, scale parameter <7 > 0 and skewness parameter p G (0,1), if its proba¬ 
bility density function (pdf) is given by 




p{\-p) 


exp{-Pp(^^)}, 


( 1 ) 


where Pp(.) is the so called check (or loss) function defined by pp{u) = u{p — I{u < 0}), with I{.} 
denoting the usual indicator function. This distribution is denoted by ALD{iJ.,o,p). It is easy to 
see that W = Pp{^-^) follows an exponential distribution exp(l). 

A stochastic representation is useful to obtain some properties of the distribution, as for ex- 
ample, the mome n ts, moment generating function ( mgf) , and estimation al gorithm. Lor the ALD 
Kotz et al. 12001 '). Kuzobowski & Podgorski 12000 1 and Zhou et al. 12013 1 presented the follow¬ 
ing stochastic representation: Let U ~ exp (a) and Z ~ N{0, 1) be two independent random vari¬ 
ables. Then, Y ~ ALD{p,o,p) can be represented as 


Y = p + ^pUA- XpV^Z, 


( 2 ) 


where = 


i~2p 

pA-p) 


and Tp = p(^i_py and = denotes equality in distribution. 


Ligure [U shows 

how the skewness of the ALD changes with altering values for p. Lor example, for p = 0.1 almost 
all the mass of the ALD is situated in the right tail. Lor p = 0.5, both tails of the ALD have 
equal mass and the distribution then equals the more common double exponential distribution. In 
contrast to the normal distribution with a quadratic term in the exponent, the ALD is linear in the 
exponent. This results in a more peaked mode for the ALD together with thicker tails. On the other 
hand, the normal distribution has heavier s houlders compared to th e ALD. Lrom dH), we have the 
hierarchical representation of the ALD, see Lum & Gelfand (2012h . given by 


Y\U = u 
U 


N{p + TApU,Tpau), 
exp{o). 


(3) 

(4) 


This representation will be useful for the implementation of the EM algorithm. Moreover, since 
Y\U = M ~ N{p + '&pU, TpOu), then one can derive easily the pdf of Y. That is, the pdf in ([B can 
be expressed as 




1 1 

-^exp 

XpO^ 



(5) 
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Figure 1: Standard asymmetric Laplace density 


where 5(y) = 7 = ^^(2 + ^) = ^ and A(y) = 2(^^) ' ^i/ 2 ( 5 (y)r). with K^{.) 

being the modified Bessel function of the third kind. It easy to see that that the conditional distri¬ 
bution of U , given 7 = y, is t/1(7 = y) ~ GIG{\, d,Y)- Here, GIG(v,a,b) denot es the Generalized 
Inverse Gaussian (GIG) distribution; see iBarndorff-Nielsen & Shephard ('20011) for more details. 
The pdf of GIG distribution is given by 


h(u\.,a,b)- 


2Kv{ab) 

The moments of U can be expressed as 


exp 


-[a^/u + b^u)^, u > 0, V G M, a, 


b>0. 


£[(/»] 


k G 


Kv{ab) 

Some properties of the Bessel function of the third kind Kx{u) that will be useful for the devel¬ 
opments here are: (i) Kv{u) = K^y{u); (ii) Ky^i{u) = ^Ky{u) +Ky^i{u); (iii) for non-negative 

integer r, K^+i/xiu) = y^exp(-M) A special case is K^/xiu) = 


2.2 Linear quantile regression 

Let yi be a response variable and x, a k x 1 vector of covariates for the fth observation, and let 
Qy.{p\xi) be the pth (0 < p < 1) quantile regression function of y,- given x/, i = 1,... . Suppose 

that the relationship between Qy.{p\xi) and x,- can be modeled as Qy.{p\xi) = xjP^, where Ppisa 
vector (kx 1) of unknown parameters of interest. Then, we consider the quantile regression model 
given by 

yi = ^JPp + £i^ ( 6 ) 
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where e, is the error term whose distribution (with density, say, fp{.)) is restrieted to have the pth 
quantile equal to zero, that is, J^^fp{£i)d£i = p. The error density fp{.) is often left unspecified 
in the classical literature. Thus, quantile regression estimation for ^ proceeds by minimizing 


^ n 

^p = argminQ -x^j3 

’^p i=\ 


(7) 


where Pp(.) is as in ([I]) and (3^ is the quantile regression estimate for at the pth quantile. The 
special case p = 0.5 corresponds to median regression. As the check function is not differen¬ 
tiable at zero, we cannot derive explicit solutions to the minimization problem. Therefore, linear 
programming methods are commonly applied to obtain quantile regression estimates for A 
connection between the minimization of the sum in ([7]) and the maximum-likelihood theory is pro¬ 
vided by the ALD; see iGeraci & Bottai (2007h . It is also true that under the quantile regression 
model, we have 


Wi = ^pp{yi-xJPp) r^e\p{\). ( 8 ) 

The above result is useful to check the model in practice, as will be seen in the Application Section. 
Now, supposeyi,... ,y„ are independent observations such as T, ~ ALD(x7CTjp), z = 1,... ,n. 

Then, from ([5]) the log-likelihood function for 6 = can be expressed as 


z(e) = £z,(S). 

(=1 


(9) 


where £i{9) = c — jlog a + ^{yt — xj -I-log (A/), with c is a constant does not depend on 0 

andAi = 2 {^y^^Ki/ 2 {?ii) = ^exp(-A/), with 5, = 5(y/) = \yi-xjPp\/Tpy/o) and A,-= dif. 

Note that if we consider a as a nuisance parameter, then the maximization of the likelihood in 
@ with respect to the parameter is equivalent to the minimization of the objective function in 
dV]). and hence the relationship between the check function and ALD can be used to reformulate 
the QR method in the likelihood framework. 

The log-likelihood function is not differentiable at zero. Therefore, standard procedures the 
estimation can not be developed following the usual way. Specifically, the standard errors for the 
maximum likelihood estimates is not based on the genuine information matrix. To overcome this 
problem we consider the empirical information matrix as will be described in the next Subsection. 


2.3 Parameter estimation via the EM algorithm 

In this section, we discuss an estimation method for QR based on the EM algorithm to obtain ML 
estimates. Also, we consider the method of moments (MM) estimators,which can be effectively 
used as starting values in the EM algorithm. Here, we show how to employ the EM algorithm for 
ML estimation in QR model under the ALD. Erom the hierarchical representation ©-(H]), the QR 
model in ® can be presented as 

Yi\Ui = Ui ~ N{xJPp + TdpUi,XpOUi), (10) 

Ui ~ exp(c7), z = (11) 

where d-p and Tp are as in @. This hierarchical representation of the QR model is convenient to 
describe the steps of the EM algorithm. Let y = (yi, • • ■ and u = (zzi,..., zz„) be the observed 
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data and the missing data, respectively. Then, the complete data log-likelihood function of 0 = 
()3 J, c)^, given (y, u), ignoring additive constant terms, is given by 4(0 |y, u) = E4i 4(0 bi, w,), 
where 

4(0|y/,M/) = -^l0g(2;rT2) _ 3 _ llog(M,-) - -^u^\y--xj p-l^pUi)^ - ^Ui, 

for / = 1,... ,n. In what follows the superscript (k) indicates the estimate of the related parameter 
at the stage k of the algorithm. The E-step of the EM algorithm requires evaluation of the so-called 
Q-function Q{0\0^’^^) = E^(k) [4(0 ly,u) ly, where Eg(j:) [.] means that the expectation is being 

effected using 0^^^ for 0. Observe that the expression of the Q-function is completely determined 
by the knowledge of the expectations 


4 .( 0 ('^^)=E[t/f|y,-, 0 (^)], 5 = - 1 , 1 , 


( 12 ) 


that are obtained of properties of the G7G(0.5, a, Z?) distribution. Eet^^^^ = (<4i (0*^^^), ■ ■ •,'4«(0*^*^))^ 
be the vector that contains all quantities defined in (fT^ . Thus, dropping unimportant constants, 

the Q-function can be written in a synthetic form as 2(0|0) = Yfi=i 2/(010)> where 


a(e|e) = -^iog<7-^ 


.^- 1 ,(«“’)(« - - 2{y, - + 4ii(9'‘’)Ti 


(13) 


This quite useful expression to implement the M-step, which consists of maximizing it over 0. So 
the EM algorithm can be summarized as follows: 

E-step: Given 0 = 0^^), compute <4/(0*^^^) through of the relation 




:W 


K,/2pP) 


,5 = -l,l. 


(14) 


where ^ y(*) = and 

wnereo, ,7 ana a, o, 7 , 

M-step: Update 9^^'^ by maximizing 2(010*^^^) over 0, which leads to the following expressions 


jS 


(<^+i) _ 

p 

(k+i) - 


= (xTd(^«)X) 'xT(d(^W)Y-V n), 




1_ |^g(j3(*^+') _OlT^v_ _l4iT>:W 


3?it2 


),^W)-2lT(Y-Xi5('+iVp + fl,i4f 


where D{a) denotes the diagonal matrix, with the diagonal elements given by a = (ai,... ,ap)^ 
and 2(0, ^_i) = (Y — Xj3)^D(|_j)(Y — X)3). A similar expression for is obtained in 

Tian et al. (2013b . This process is iterated until some distance involving two successive evalua¬ 
tions of the actual log-likelihood £(0), like ||£(0^^'^^^) — £(0*'*^)|| or ||£(0)^'^^))/£(0^^)) — 1 ||, is 
small enough. This algorithm is implemented as part of the R package ALDqrO, which can be 
cost from the repository CRAN. Eurthermore, following the results given in 
, the MM estimators for and o are solutions of the following equations: 


downloaded at not 


Yu & Zhang 12005 


Pp„=(xTx) 'xT(Y- 5„V") and5M=lEpp(}-i-x7gp*,), 

^ 1=1 


( 15 ) 
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where -d-p is as dH). Note that the MM estimators do not have explieit elosed form and numerieal 
proeedures are needed to solve these non-linear equations. They ean be used as initial values in the 
iterative proeedure for eomputing the ML estimates based on the EM-algorithm. Standard errors 
fo r the maximum li kelihood estimates is based on the empirieal information matrix, that aeeording 


to 


Meilijson (1989h formula, is defined as 


L(0)= 


(16) 


./=i 


where S(yy|0) = T!j=i s(y; |0). It is noted from the result of Louis tl982h that the individual seore 

can be determined ass(yy|0) = d\og f{yj\6)/dd = E{decj{B\yj,Ui)/d0\yj,ey Asymptotic con¬ 
fidence intervals and tests of the parameters at the pth level can be obtained assuming that the ML 
estimator 0 has approximately a normal multivariate distribution. 

From the EM algorithm, we can see that is inversely proportional to di = |y/— 

Hence, Ui{0^^^) ^ S’-u{0^^^) can be interpreted as a type of weight for the fth case 

in the estimates of which tends to be small for outlying observations. The behavior of these 
weights can be used as tools for identifying outlying observations as well as for showing that we 
are considering a robust approach, as will be seen in Sections 4 and 5. 


3 Case-deletion measures 


Case-deletion is a classical approach to study the effects of dropping the fth case from the data 
set. Eet yc = (y,u) be the augmented data set, and a quantity with a subscript “[z]” denotes the 
original one with the zth observation deleted. Thus, The complete-data log-likelihood function 

based on the data with the zth case deleted will be denoted by 4(0|yc[!])- Let 

be the maximizer of the function = E^ [£c(0|Yc[,])|y], where 0 = (j3 ,(7^)^ is the ML 

estimate of 0. To assess the influence of the zth case on 0, we compare the difference between 
0[,] and 0. If the deletion of a case seriously influences the estimates, more attention needs to 
be paid to that case. Hence, if 0 [;■] is far from 0 in some sense, then the zth case is regarded as 
influential. As 0[;] is needed for every case, the required computational effort can be quite heavy, 

esneciallv when the sample size is large. Hence, To calculate the case-deletion estimate 0 H of 0, 
fsee lZhu et ah. 200 ih proposed the following one-step approximation based on the Q-function, 


■9w = »+{-e(«i«)} ‘ei.](8|8), 


where 


^2^(010), . dQu](0\0] 

and e[,.|(eie) = 


( 90(90 


10 = 0 ’ 


(17) 


(18) 


are the Hessian matrix and the gradient vector e yaluated at 0, res pectively. The Hessian matrix is 
an essential element in the method developed by Zhu et al. (200 lb to obtain the measures for case- 
deletion diagnosis. For developing the case-deletion measures, we have to obtain the elements in 
(fTTI) . 2|;](0|0) and 2(010). These formulas can be obtained quite easily from (fT3]) : 
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1. The components of 2[,] (010) are 


A 5Qh(»I»)| U 


and 


^eM( 0 | 0 ) 


1 




where 




/T-Z 

> Mi L 


;W 




and 


= I 

Mi L 


- P p)^ - ^(yj -^J P p)^p + 


(19) 

( 20 ) 


2. The elements of the second order partial derivatives of 2(010) evaluated at 0 are 

Qp(o\e) = 


C7T 


12xTd(5<‘1)x, 


&(8|9)} = 


and Qp^{e\e)} = 0. 


I 


4o^ lo^xj 


e (/5.« i‘l) - 21 J (Y - X13) l»p + 5 S' 
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In the following result, we will obtain the one-step approximation of 0 [;■] = %■]) ,(=l,...,n 

based on (fT71) . viz., the relationships between the parameter estimates for the full data set and the 
data with the ith case deleted. 

Theorem 3.1. For the QR model defined in t l7^) and ([77]), the relationships between the parameter 
estimates for full data set and the data with the ith case deleted are as follows: 

fplH = P„ + t)(X^D(|_,)X)''£i[.-| and ?[,| = ?--L(e„( 0 |g))^‘£ 2 [,|, 

2(72 Y 2 

where and £' 2 [/] <25 in ( 1791) and ( |2^ . respectively. 

To asses the influence of the /th case on the ML estimate 0, we compare 0 m and 0 based on 


metrics, proposed by lZhu et al. (200 m . for measuring the distance between 0[,] and 0. For that, 
we consider here the following; 


1. Generalized Cook distance: 

GD, = (0[,] - 0)T{ - 2(010)}(0[,-] - 0), ) = 1,..., 

Upon substituting ([TV]) into (l2T]) . we obtain the approximation 

gd 1 = 4.](0|0)T{-2(0|0)}"^2h(0|0), (= 1 ,..., 


n. 


( 21 ) 


n. 
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As Q{6\d) is a diagonal matrix, one can obtain easily a type of Generalized Cook distance 
for parameters j3 and o, respectively, as follows 


GD‘((j) = e|i,„(8|8)T{-e„(eie)}“‘e[,]„(§|8), ,= 


2. Q-distance: This measure of the influe nce of the dh case is base d on the Q-distance function, 
similar to the likelihood distance LDi dCook & Weisberg. 1982h . defined as 


2A=2{e(0|0)-e(0[,]|0)}. (22) 

We can calculate an approximation of the likelihood displacement QDj by substituting (fT71) 
into (l22l) . resulting in the following approximation QDj of QDi: 


QDj=2{Q{d\d)-Q{eli^\e)}. 


4 Application 

We illustrate the proposed methods by applying them to the Australian Institute of Sport (AIS) 
data, analyzed by Cook and Weisberg (1994) in a normal regression setting. The data set consists 
of several variables measured mn = 202 athletes (102 males and 100 females). Here, we focus on 
body mass index (BMI), which is assumed to be explained by lean body mass (LBM) and gender 
(SEX). Thus, we consider the following QR model: 


BMIi — /3q A PiLBMi -1- ^SEX^i + £;, i — 1,..., 202 , 

where £; is a zero p quantile. This model can be fitted in the R software by using the package 
quantregO, where one can arbitrarily use the BR or the LPQR algorithms. In order to compare 
with our proposed EM algorithm, we carry out quantile regression at three different quantiles, 
namely p = {0.1,0.5,0.9} by using the AED distribution as described in Section 2. The ME 
estimates and associated standard errors were obtained by using the EM algorithm and the observed 
information matrix described in Subsections 2.3, respectively. Table [T| compares the results of our 
EM, BR and the EPQR estimates under the three selected quantiles. The standard error of the 
EPQR estimates are not provided in the R package quantregO and are not shown in Table [B 
Prom this table we can see that estimates under the three methods only exhibit slight differences, 
as expected. However, the standard errors of our EM estimates are smaller than those via the BR 
algorithm. This suggests that the EM algorithm seems to produce more accurate estimates of the 
regression parameters at the pth level. To obtain a more complete picture of the effects, a series of 
QR models over the grid p = (0.1,0.15,... ,0.95} is estimated. Pigure[2]gives a graphical summary 
of this analysis. The shaded area depicts the 95% confidence interval from all the parameters. Prom 
Pigure[2]we can observe some interesting evidences which cannot be detected by mean regression. 
Por example, the effect of the two variables (EBM and gender) become stronger for the higher 
conditional quantiles, indicating that the BMI are positively correlated with the quantiles. The 
robustness of the median regression (p = 0.5) can be assessed by considering the influence of a 
single outlying observation on the EM estimate of 0. In particular, we can assess how much the 
EM estimate of 0 is influenced by a change of d units in a single observation y,-. Replacing y, by 
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Table 1: AIS data. Results of the parameter estimation via EM, Barrodale and Roberts (BR) and 
Lasso Penalized Quantile Regression (LPQR) algorithms for three selected quantiles. 




EM 

BR 


EPQR 

p 

Parameter 

MLE 

SE 

Estimative 

SE 

Estimative 

0.1 

Po 

9.3913 

0.7196 

9.3915 

1.2631 

9.8573 


Pi 

0.1705 

0.0091 

0.1705 

0.0160 

0.1647 


Pi 

0.8312 

0.2729 

0.8209 

0.4432 

0.6684 


o 

0.2617 

0.0252 

1.0991 


1.0959 

0.5 

A) 

7.6480 

0.8717 

7.6480 

1.1120 

7.6480 


Pi 

0.2160 

0.0116 

0.2160 

0.0159 

0.2160 


Pi 

2.2499 

0.3009 

2.2226 

0.4032 

2.2226 


a 

0.6894 

0.0590 

0.6894 


0.6894 

0.9 

Po 

5.8000 

0.5887 

5.8000 

1.6461 

6.0292 


Pi 

0.2700 

0.0084 

0.2700 

0.0256 

0.2678 


Pi 

3.9596 

0.1937 

3.9658 

0.6203 

3.8271 


a 

0.3391 

0.0258 

1.2677 


1.2767 






Figure 2: AIS data: ML estimates and 95% confidence intervals for various values of p. 


yi(S) = ji + 8sd{y), where sd{.) denotes the standard deviation. Let /3/(5) be the EM estimates of 
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Figure 3: Percentage of change in the estimation of /3o, /3i and P 2 in comparison with the true 
value, for median {p = 0.5) and mean regression, for different contaminations d. 
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Figure 4: AIS data: Q-Q plots and simulated envelopes for mean and median regression. 


Pj after contamination, j = 1,2,3. We are particularly interested in the relative changes | (/3j(5) — 
/3j)//3/|. In this study we contaminated the observation corresponding to individual {#146} and 
for d between 0 and 10. Figure [3] displays the results of the relative changes of the estimates 
for different values of 5. As expected, the estimates from the median regression model are less 
affected by variations on 5 than those of the mean regression. Moreover, Figure |4] shows the Q-Q 
plot and envelopes for mean and median regression, which are obtained based on the distribution 
of Wi, given in dS]), that follows exp(l) distribution. The lines in these figures represent the 5th 
percentile, the mean and the 95th percentile of 100 simulated points for each observation. These 
figures clearly show that the median regression distribution provides a better-fit than the standard 
mean regression to the AIS data set. 

As discussed at the end of Section 2.3 the estimated distance J, = |y/ — xjPp\/o can be used 
efficiently as a measure to identify possible outlying observations. Figure [5}left panel) displays 
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Figure 5: AIS data: Index plot of the distanee di and the estimated weights ui. 


the index plot of the distanee di for the median regression model (p = 0.5). We see from this 
figure that observations #75, #162, #178 and #179 appear as possible outliers. From the EM- 
algorithm, the estimated weights Ui{6) = (S‘si{B) for these observations are the smallest ones (see 
right panel in Figure [5]), eonfirming the robustness aspects of the maximum likelihood estimates 
against outlying observations of the QR models. Thus, larger di implies a smaller Ui{6), and the 
estimation of 9 tends to give smaller weight to outlying observations in the sense of the distance 


di. 

Figure [6] shows the estimated quartiles of two levels of gender at each LBM point from our EM 
algorithm along with the estimates obtained via mean regression. Erom this figure we can see clear 
attenuation in /3i due to the use of the median regression related to the mean regression. It is possi¬ 
ble to observe in this figure some atypical individuals that could have an influence on the ME esti¬ 
mates for different values of quantiles. In this figure, the individuals #75, #130, #140 #162, #160 
and #178 were marked since they were detected as potentially influential. 

In order to identify influential observations at different quantiles when some observation is 
eliminated, we can generate graphs of the generalized Cook distance GD^, as explained in Section 
|3l A high value for GD\ indicates that the fth o bservation has a hig h impact on the maximum 
likelihood estimate of the parameters. Eollowing iBarros et al. (20101) . we can use 2{p + \)/n as 
benchmark for the GD[ at different quantiles. Eigure|7] (first row) presents the index plots of GD\. 
We note from this figure that, only observation #140 appears as influential in the ML estimates at 
p = 0.1 and observations #75, #178 as influential at p = 0.5, whereas observations #75, #162,#178 
and #179 appear as influential in the ML estimates at p = 0.9. Eigure|7] (second row) presents the 
index plots of QD\. Erom this figure, it can be noted that observations #76,#130,#140 appear to 
be influential at p = 0.1, whereas observations #75,#162 and #178 seem to be influential in the 
ML estimates at p = 0.1, and in addition observation #179 appears to be influential at p = 0.9. 
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Figure 6: AIS data: Fitted regression lines for the three seleeted quantiles along with the mean 
regression line. The influential observations are numbered. 


p=0.S 





p=0.1 


p=0.5 


p=0.9 


o ° ° 

' ° °0 ® ° O o 

’ °°° o 

os'S.°°I°oS° SLi® Vo 


„ ® o 

o OOOo 


OcPO„ 0°00@o5300 
° ‘^00° 0,8 ° 

Jbhfi 0°° 


~r~ 



Figure 7: Index plot of (first row) approximate likelihood distanee GDj. (second row). Index plot 
of approximate likelihood displacement QDJ . The influential observations are numbered. 
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5 Simulation studies 


In this section, the results from two simulation studies are presented to illustrate the performance 
of the proposed method. 

5.1 Robustness of the EM estimates (Simulation study 1) 

We conducted a simulation study to assess the performance of the proposed EM algorithm, by 
mimicking the setting of the AIS data by taking the sample size n = 202. We simulated data from 
the model 


Ji = pi + p2Xi2 + p3Xi3 + £/, i = 1,..., 202, (23) 

where the x'ljS are simulated from a uniform distribution (U(0,1)) and the errors e,y are simu¬ 
lated from four different distributions: (i) the standard normal distribution A(0,1), (ii) a Student-t 
distribution with three degrees of freedom, ^ 3 ( 0 ,1), (Hi) a heteroscedastic normal distribution, 
(1 -|-a:, 2 )A^( 0 , 1) and, (iv) a bimodal mixture distribution 0 . 6 t 3 (—20,1) 3-0.4^3(15,1). The true val¬ 
ues of the regression parameters were taken as /3i = /32 = /33 = 1. In this way, we had four settings 
and for each setting we generated 10000 data sets. 

Once the simulated data were generated, we fit a QR model, with p = 0.1, 0.5 and 0.9, under 
Barrodale and Roberts (BR), Lasso (Lasso) and EM algorithms by using the "quantregO" package 
and our ALDqr () package, from the R language, respectively. Lor the four scenarios, we computed 
the bias and the square root of the mean square error (RMSE), for each parameter over the M = 
10,000 replicas. They are defined as: 

Bias(Y) = 7 —yand RMSE(y) = \JSE{'y)'^ -\-Bias{y)'^ (24) 

where 7 = ^ Y!f=\ Yi and SE(y)^ = ^ , with 7 = /3i, p 2 , p 3 or o, 7 is the estimate of 

7 obtained in replica i and 7 is the true value. Table [2]reports the simulation results for p = 0.1, 0.5 
and 0.9. We observe that the EM yields lower biases and RMSE than the other two estimation 
methods under all the distributional scenarios. This finding suggests that the EM would produce 
better results than other alternative methods typically used in the literature of QR models. 

5.2 Asymptotic properties (Simulation study 2) 

We also conducted a simulation study to evaluate the finite-sample performance of the parameter 
estimates. We generated artificial samples from the regression model (l23l) with Pi = P 2 = P 3 = ^ 
and Xij ~ f/(0,1). We chose several distributions for the random term e, a little different than 
the simulation study 1, say, (i) normal distribution N(0,2) (Nl), (ii) a Student-t distribution 
^ 3 ( 0 , 2 ) (Tl), (Hi) a heteroscedastic normal distribution, (1 -1 -vq)A^( 0,2) (N2) and, (fv) a bimodal 
mixture distribution 0 . 6 t 3 (—20,2)-f 0.4^3(15,2) (T2). Einally, the sample sizes were fixed at 
n = 50,100,150,200,300, 400,500,700 and 800. 

Eor each combination of parameters and sample sizes, 10000 samples were generated under the 
four different situations of error distributions (Nl, Tl, N2, T2). Therefore, 36 different simulation 
runs are performed. Once all the data were simulated, we fit the QR model with p = 0.5 and the 
bias (l24l) and the square root of the mean square error (l24l) were recorded. The results are shown 
in Eigure[ 8 l We can see a pattern of convergence to zero of the bias and MSE when n increases. 


14 




Table 2: Simulation study. Bias and root mean-squared error (RMSE) of J3 under different error 
distributions. The estimates under Barrodale and Roberts (BR) and Lasso (Lasso) algorithms were 
obtained by the "quantregO" paekage from the R language. 






p2 




Method 

P 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

e~A^(0,l) 

BR 

0.1 

-1.2639 

1.3444 

0.0076 

0.5961 

-0.0030 

0.5934 


0.5 

0.0064 

0.3376 

-0.0048 

0.4390 

-0.0051 

0.4453 


0.9 

1.2640 

1.3460 

0.0030 

0.6051 

0.0069 

0.6039 

LPQR 

0.1 

-0.9664 

1.0464 

-0.3072 

0.6165 

-0.3110 

0.6187 


0.5 

0.1474 

0.3628 

-0.1463 

0.4534 

-0.1462 

0.4576 


0.9 

1.5901 

1.6460 

-0.3164 

0.6173 

-0.3076 

0.6179 

EM 

0.1 

-1.2551 

1.3362 

-0.0055 

0.5964 

-0.0090 

0.6020 


0.5 

0.0040 

0.3286 

-0.0050 

0.4332 

-0.0031 

0.4363 


0.9 

1.2694 

1.3484 

-0.0071 

0.6019 

-0.0120 

0.5955 

e~f3(0,l) 

BR 

0.1 

-1.2446 

1.3364 

-0.0290 

0.6274 

-0.0313 

0.6259 


0.5 

0.1049 

0.4870 

0.1213 

0.6714 

0.1123 

0.6708 


0.9 

2.3618 

2.8408 

1.0056 

2.4928 

0.9459 

2.4332 

LPQR 

0.1 

-0.9315 

1.0219 

-0.3478 

0.6422 

-0.3412 

0.6354 


0.5 

0.3007 

0.5410 

-0.0928 

0.6310 

-0.0831 

0.6237 


0.9 

3.0443 

3.2880 

0.1911 

1.6375 

0.2231 

1.6601 

EM 

0.1 

-1.2287 

1.3213 

-0.0402 

0.6209 

-0.0374 

0.6265 


0.5 

0.0965 

0.4866 

0.1352 

0.6789 

0.1304 

0.6758 


0.9 

2.3781 

2.8459 

0.9464 

2.4082 

0.9264 

2.4167 

e~ (l-h.x2)A^(0,l) 

BR 

0.1 

-1.2869 

1.4256 

0.0130 

0.8706 

-1.2554 

1.5381 


0.5 

-0.0051 

0.4468 

0.0049 

0.6336 

0.0061 

0.6509 


0.9 

1.2868 

1.4259 

0.0018 

0.8686 

1.2307 

1.5256 

LPQR 

0.1 

-1.1393 

1.2272 

-0.3694 

0.7773 

-1.1450 

1.2756 


0.5 

0.1834 

0.4520 

-0.1906 

0.6193 

-0.1963 

0.6304 


0.9 

1.6972 

1.7933 

-0.3621 

0.7925 

0.7494 

1.1587 

EM 

0.1 

-1.2772 

1.4140 

0.0051 

0.8646 

-1.2341 

1.5195 


0.5 

0.0954 

0.4892 

0.1289 

0.6724 

0.1316 

0.6694 


0.9 

1.2599 

1.3987 

0.0076 

0.8723 

1.2488 

1.5315 

e-0.6f3(-20,l)-h0.4f3(15,l) 

BR 

0.1 

-1.2350 

1.3268 

-0.0395 

0.6160 

-0.0396 

0.6192 


0.5 

0.1029 

0.4896 

0.1214 

0.6780 

0.1212 

0.6741 


0.9 

2.3857 

2.8737 

0.9657 

2.4574 

0.9558 

2.4585 

LPQR 

0.1 

-0.9664 

1.0464 

-0.3072 

0.6165 

-0.3110 

0.6187 


0.5 

0.1474 

0.3628 

-0.1463 

0.4534 

-0.1462 

0.4576 


0.9 

1.5901 

1.6460 

-0.3164 

0.6173 

-0.3076 

0.6179 

EM 

0.1 

-0.9327 

1.0201 

-0.3491 

0.6433 

-0.3355 

0.6372 


0.5 

0.2880 

0.5343 

-0.0745 

0.6216 

-0.0717 

0.6159 


0.9 

3.0624 

3.3102 

0.1702 

1.6627 

0.2221 

1.6575 
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Figure 8 : Simulation study 2. Average bias (first column) and average MSE (second column) of 
the estimates of diSi, k with p = 0.5 (median regression), where Nl = A(0,2), Tl = ^ 3 ( 0 ,2), 
A/2 = (1 +;.,)W(b, 2) and r2 = (-20.2)+0.4(15,2) . 


As a general rule, we can say that bias and MSE tend to approach to zero when the sample size 
increases, indicating that the estimates based on the proposed EM-type algorithm do provide good 
asymptotic properties. This same pattern of convergence to zero is repeated considering different 
levels of the quantile p. 


6 Conclusion 

We have studied a likelihood-based approach to the estimation of the QR based on the asymmet¬ 
ric Laplace distribution (ALD). By utilizing the relationship between the QR check function and 
the ALD, we cast the QR problem into the usual likelihood framework. The mixture represen- 
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tation of the ALD allows us to express a QR model as a normal regression model, facilitating 
the implementation of an EM algorithm, which naturally provides the ML estimates of the model 
parameters with the observed information matrix as a by product. The EM algorithm was im¬ 
plemented as part of the R package ALDqr(). We hope that by making the code of our method 
available, we will lower the barrier for other researchers to use the EM algorithm in their studies 
of quantile regression. Eurther, we presented diagnostic analy sis in QR models, whic h was based 
on the case-deletion technique suggested by Izhu et al. (20011) and Izhu & T,ee (20011). whic h are 
the counterp arts for missing data models of the well-known ones proposed by ICook (1977h and 
Cook (1986h . The simulation studies demonstrated the superiority of the proposed methods to the 


existing methods, implemented in the package quantreg (). We applied our methods to a real data 
set (freely downloadable from R) in order to illustrate how the procedures can be used to identify 
outliers and to obtain robust ML parameter estimates. Erom these results, it is encouraging that the 
use of ALD offers a better alternative in the analysis of QR models. 

Einally, the proposed methods can be extended to a more general framework, such as, cen¬ 
sored (Tobit) regression models, measurement error models, nonlinear regression models, stochas¬ 
tic volatility models, etc and should yield satisfactory results at the expense of additional com¬ 
plexity in implementation. An in-depth investigation of such extensions is beyond the scope of the 
present paper, but these are interesting topics for further research. 
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